{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n=======================================================================\nShrinkage covariance estimation: LedoitWolf vs OAS and max-likelihood\n=======================================================================\n\nWhen working with covariance estimation, the usual approach is to use\na maximum likelihood estimator, such as the\n:class:`sklearn.covariance.EmpiricalCovariance`. It is unbiased, i.e. it\nconverges to the true (population) covariance when given many\nobservations. However, it can also be beneficial to regularize it, in\norder to reduce its variance; this, in turn, introduces some bias. This\nexample illustrates the simple regularization used in\n`shrunk_covariance` estimators. In particular, it focuses on how to\nset the amount of regularization, i.e. how to choose the bias-variance\ntrade-off.\n\nHere we compare 3 approaches:\n\n* Setting the parameter by cross-validating the likelihood on three folds\n  according to a grid of potential shrinkage parameters.\n\n* A close formula proposed by Ledoit and Wolf to compute\n  the asymptotically optimal regularization parameter (minimizing a MSE\n  criterion), yielding the :class:`sklearn.covariance.LedoitWolf`\n  covariance estimate.\n\n* An improvement of the Ledoit-Wolf shrinkage, the\n  :class:`sklearn.covariance.OAS`, proposed by Chen et al. Its\n  convergence is significantly better under the assumption that the data\n  are Gaussian, in particular for small samples.\n\nTo quantify estimation error, we plot the likelihood of unseen data for\ndifferent values of the shrinkage parameter. We also show the choices by\ncross-validation, or with the LedoitWolf and OAS estimates.\n\nNote that the maximum likelihood estimate corresponds to no shrinkage,\nand thus performs poorly. The Ledoit-Wolf estimate performs really well,\nas it is close to the optimal and is computational not costly. In this\nexample, the OAS estimate is a bit further away. Interestingly, both\napproaches outperform cross-validation, which is significantly most\ncomputationally costly.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom scipy import linalg\n\nfrom sklearn.covariance import LedoitWolf, OAS, ShrunkCovariance, \\\n    log_likelihood, empirical_covariance\nfrom sklearn.model_selection import GridSearchCV\n\n\n# #############################################################################\n# Generate sample data\nn_features, n_samples = 40, 20\nnp.random.seed(42)\nbase_X_train = np.random.normal(size=(n_samples, n_features))\nbase_X_test = np.random.normal(size=(n_samples, n_features))\n\n# Color samples\ncoloring_matrix = np.random.normal(size=(n_features, n_features))\nX_train = np.dot(base_X_train, coloring_matrix)\nX_test = np.dot(base_X_test, coloring_matrix)\n\n# #############################################################################\n# Compute the likelihood on test data\n\n# spanning a range of possible shrinkage coefficient values\nshrinkages = np.logspace(-2, 0, 30)\nnegative_logliks = [-ShrunkCovariance(shrinkage=s).fit(X_train).score(X_test)\n                    for s in shrinkages]\n\n# under the ground-truth model, which we would not have access to in real\n# settings\nreal_cov = np.dot(coloring_matrix.T, coloring_matrix)\nemp_cov = empirical_covariance(X_train)\nloglik_real = -log_likelihood(emp_cov, linalg.inv(real_cov))\n\n# #############################################################################\n# Compare different approaches to setting the parameter\n\n# GridSearch for an optimal shrinkage coefficient\ntuned_parameters = [{'shrinkage': shrinkages}]\ncv = GridSearchCV(ShrunkCovariance(), tuned_parameters)\ncv.fit(X_train)\n\n# Ledoit-Wolf optimal shrinkage coefficient estimate\nlw = LedoitWolf()\nloglik_lw = lw.fit(X_train).score(X_test)\n\n# OAS coefficient estimate\noa = OAS()\nloglik_oa = oa.fit(X_train).score(X_test)\n\n# #############################################################################\n# Plot results\nfig = plt.figure()\nplt.title(\"Regularized covariance: likelihood and shrinkage coefficient\")\nplt.xlabel('Regularization parameter: shrinkage coefficient')\nplt.ylabel('Error: negative log-likelihood on test data')\n# range shrinkage curve\nplt.loglog(shrinkages, negative_logliks, label=\"Negative log-likelihood\")\n\nplt.plot(plt.xlim(), 2 * [loglik_real], '--r',\n         label=\"Real covariance likelihood\")\n\n# adjust view\nlik_max = np.amax(negative_logliks)\nlik_min = np.amin(negative_logliks)\nymin = lik_min - 6. * np.log((plt.ylim()[1] - plt.ylim()[0]))\nymax = lik_max + 10. * np.log(lik_max - lik_min)\nxmin = shrinkages[0]\nxmax = shrinkages[-1]\n# LW likelihood\nplt.vlines(lw.shrinkage_, ymin, -loglik_lw, color='magenta',\n           linewidth=3, label='Ledoit-Wolf estimate')\n# OAS likelihood\nplt.vlines(oa.shrinkage_, ymin, -loglik_oa, color='purple',\n           linewidth=3, label='OAS estimate')\n# best CV estimator likelihood\nplt.vlines(cv.best_estimator_.shrinkage, ymin,\n           -cv.best_estimator_.score(X_test), color='cyan',\n           linewidth=3, label='Cross-validation best estimate')\n\nplt.ylim(ymin, ymax)\nplt.xlim(xmin, xmax)\nplt.legend()\n\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}